-
Notifications
You must be signed in to change notification settings - Fork 904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wip(agents-api): Add Doc sql queries #979
base: f/switch-to-pg
Are you sure you want to change the base?
Conversation
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
CI Failure Feedback 🧐(Checks updated until commit 3600a92)
✨ CI feedback usage guide:The CI feedback tool (
In addition to being automatically triggered, the tool can also be invoked manually by commenting on a PR:
where Configuration options
See more information about the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Changes requested. Reviewed everything up to 6c77490 in 1 minute and 30 seconds
More details
- Looked at
848
lines of code in10
files - Skipped
0
files when reviewing. - Skipped posting
8
drafted comments based on config settings.
1. agents-api/agents_api/queries/docs/search_docs_hybrid.py:13
- Draft comment:
The import statement forrun_concurrently
is unnecessary since it's not used in the code. Consider removing it to clean up the imports. - Reason this comment was not posted:
Confidence changes required:10%
The import statement forrun_concurrently
is unnecessary since it's not used in the code.
2. agents-api/agents_api/queries/docs/search_docs_hybrid.py:134
- Draft comment:
Appending an empty list totasks
is incorrect. You should append a coroutine or task. Consider usingNone
or a similar placeholder if you intend to skip adding a task. - Reason this comment was not posted:
Confidence changes required:50%
The code usestasks.append([])
which is incorrect for appending tasks. It should append a coroutine instead.
3. agents-api/agents_api/queries/docs/search_docs_hybrid.py:147
- Draft comment:
Appending an empty list totasks
is incorrect. You should append a coroutine or task. Consider usingNone
or a similar placeholder if you intend to skip adding a task. - Reason this comment was not posted:
Confidence changes required:50%
The code usestasks.append([])
which is incorrect for appending tasks. It should append a coroutine instead.
4. agents-api/agents_api/queries/docs/search_docs_hybrid.py:154
- Draft comment:
Ensure that at least one valid task is added totasks
before callinggather
. If bothtext_query
andembedding
are empty,tasks
will contain only empty lists, leading to an error. - Reason this comment was not posted:
Confidence changes required:50%
The code does not handle the case where bothtext_query
andembedding
are empty, which could lead to an error when callinggather
.
5. agents-api/agents_api/queries/docs/search_docs_hybrid.py:110
- Draft comment:
Theembedding
parameter should have a default value of an empty list[]
instead ofNone
to avoid type issues and simplify checks. - Reason this comment was not posted:
Confidence changes required:50%
Theembedding
parameter should have a default value of an empty list instead ofNone
to avoid type issues.
6. agents-api/agents_api/queries/docs/create_doc.py:69
- Draft comment:
Consider adding more exception handling for other potential database errors, such asasyncpg.DataError
orasyncpg.SyntaxOrAccessError
, to make the function more robust. - Reason this comment was not posted:
Confidence changes required:50%
Therewrap_exceptions
decorator is used to handle exceptions, but it doesn't cover all possible exceptions that might occur during database operations.
7. agents-api/agents_api/queries/docs/delete_doc.py:43
- Draft comment:
Consider adding more exception handling for other potential database errors, such asasyncpg.DataError
orasyncpg.SyntaxOrAccessError
, to make the function more robust. - Reason this comment was not posted:
Confidence changes required:50%
Therewrap_exceptions
decorator is used to handle exceptions, but it doesn't cover all possible exceptions that might occur during database operations.
8. agents-api/agents_api/queries/docs/get_doc.py:29
- Draft comment:
Consider adding exception handling for potential database errors, such asasyncpg.DataError
orasyncpg.SyntaxOrAccessError
, to make the function more robust. - Reason this comment was not posted:
Confidence changes required:50%
Therewrap_exceptions
decorator is used to handle exceptions, but it doesn't cover all possible exceptions that might occur during database operations.
Workflow ID: wflow_0SgPnESfFL0Scwfr
Want Ellipsis to fix these issues? Tag @ellipsis-dev
in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
return similarity | ||
|
||
|
||
def maximal_marginal_relevance( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an exact duplicate of the existing implementation.
- function
maximal_marginal_relevance
(mmr.py)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 93673b7 in 1 minute and 3 seconds
More details
- Looked at
1195
lines of code in21
files - Skipped
0
files when reviewing. - Skipped posting
5
drafted comments based on config settings.
1. agents-api/agents_api/queries/docs/get_doc.py:5
- Draft comment:
The import statement for 'asyncpg' is unnecessary and can be removed to clean up the code. This is also applicable in other files like 'list_docs.py', 'search_docs_by_embedding.py', and 'search_docs_by_text.py'. - Reason this comment was not posted:
Confidence changes required:50%
The import statement for 'asyncpg' is unnecessary in multiple files as it is not used anywhere in the code. Removing it will clean up the code.
2. agents-api/agents_api/queries/docs/list_docs.py:5
- Draft comment:
The import statement for 'asyncpg' is unnecessary and can be removed to clean up the code. This is also applicable in other files like 'search_docs_by_embedding.py', and 'search_docs_by_text.py'. - Reason this comment was not posted:
Confidence changes required:50%
The import statement for 'asyncpg' is unnecessary in multiple files as it is not used anywhere in the code. Removing it will clean up the code.
3. agents-api/agents_api/queries/docs/search_docs_by_embedding.py:6
- Draft comment:
The import statement for 'asyncpg' is unnecessary and can be removed to clean up the code. This is also applicable in 'search_docs_by_text.py'. - Reason this comment was not posted:
Confidence changes required:50%
The import statement for 'asyncpg' is unnecessary in multiple files as it is not used anywhere in the code. Removing it will clean up the code.
4. agents-api/agents_api/queries/docs/search_docs_by_text.py:6
- Draft comment:
The import statement for 'asyncpg' is unnecessary and can be removed to clean up the code. - Reason this comment was not posted:
Confidence changes required:50%
The import statement for 'asyncpg' is unnecessary in multiple files as it is not used anywhere in the code. Removing it will clean up the code.
5. agents-api/agents_api/queries/docs/create_doc.py:98
- Draft comment:
The 'org' option is removed from the owner_type Literal, but this change is not reflected in the PR description. This should be documented for clarity. This change is also applicable in 'delete_doc.py', 'get_doc.py', and 'list_docs.py'. - Reason this comment was not posted:
Confidence changes required:50%
The 'org' option is removed from the owner_type Literal in multiple files, but this change is not reflected in the PR description. This should be documented for clarity.
Workflow ID: wflow_pnbwiXEmTCJVnbrw
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Changes requested. Incremental review on dc0ec36 in 48 seconds
More details
- Looked at
939
lines of code in28
files - Skipped
0
files when reviewing. - Skipped posting
1
drafted comments based on config settings.
1. agents-api/agents_api/queries/users/delete_user.py:59
- Draft comment:
Theasyncpg.exceptions.UniqueViolationError
should not be handled here as it is not relevant to delete operations. Consider removing this exception handling. - Reason this comment was not posted:
Marked as duplicate.
Workflow ID: wflow_WUrKdAU41fxw9Cd0
Want Ellipsis to fix these issues? Tag @ellipsis-dev
in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
status_code=404, | ||
detail="The specified developer does not exist.", | ||
), | ||
asyncpg.exceptions.UniqueViolationError: partialclass( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The asyncpg.exceptions.UniqueViolationError
should not be handled here as it is not relevant to delete operations. Consider removing this exception handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 831e950 in 51 seconds
More details
- Looked at
440
lines of code in11
files - Skipped
0
files when reviewing. - Skipped posting
3
drafted comments based on config settings.
1. agents-api/agents_api/queries/docs/get_doc.py:36
- Draft comment:
Usingast.literal_eval
ond["content"]
can be unsafe if the content is not guaranteed to be a valid Python literal. Consider using a safer method to parse or handle the content. - Reason this comment was not posted:
Comment was on unchanged code.
2. agents-api/agents_api/queries/docs/embed_snippets.py:10
- Draft comment:
vectorizer_query
is set toNone
. This is a placeholder and should be replaced with an actual query before deployment. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment is essentially repeating information that's already explicitly stated in a TODO comment one line above. The TODO comment is more visible and serves the same purpose. This makes the PR comment redundant and not adding any new information or value.
Perhaps the comment is trying to emphasize the importance of not deploying with a None query, which could be a critical issue.
While deployment concerns are valid, the existing TODO comment already makes it clear this needs to be replaced, and deployment issues would be caught by basic testing since the function would fail immediately.
Delete the comment as it's redundant with the existing TODO comment and doesn't provide additional actionable value.
3. agents-api/agents_api/queries/entries/list_entries.py:88
- Draft comment:
Ensuresort_by
anddirection
are validated and sanitized to prevent SQL injection, as they are used in string interpolation for SQL queries. - Reason this comment was not posted:
Comment did not seem useful.
Workflow ID: wflow_TPwdOI1YwvBHnQ9P
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 249513d in 1 minute and 4 seconds
More details
- Looked at
1148
lines of code in15
files - Skipped
0
files when reviewing. - Skipped posting
5
drafted comments based on config settings.
1. agents-api/agents_api/queries/docs/create_doc.py:153
- Draft comment:
The function assumesdata.content
is always a list. Consider adding a check or conversion to handle cases wheredata.content
might be a string or other type to prevent unexpected behavior. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment suggests adding a check that already exists in the code. The code already handles both list and non-list cases appropriately through the isinstance() check and separate logic paths. The comment appears to be incorrect in stating that the function "assumes data.content is always a list" when it clearly doesn't make that assumption.
Could there be other types besides list and non-list that need handling? Could the CreateDocRequest type definition enforce the content type making this check unnecessary?
The code's else branch handles any non-list type appropriately by treating it as a single content item. The type validation would be handled by FastAPI's request validation via CreateDocRequest if needed.
The comment should be deleted because it incorrectly suggests adding a check that already exists in the code. The code already properly handles both list and non-list content.
2. agents-api/agents_api/queries/docs/delete_doc.py:25
- Draft comment:
TheEXISTS
clause in the SQL query might be redundant since thedoc_owners
entry is already deleted in thedeleted_owners
CTE. Consider revising the logic to ensure thedocs
entry is only deleted if thedoc_owners
entry existed prior to deletion. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment raises an interesting point about the query logic - we delete from doc_owners first, then check if that same record existed in doc_owners before deleting from docs. However, this could actually be intentional behavior to ensure atomicity and proper ordering. Without deeper knowledge of the data model and requirements, we can't be certain this is actually a problem vs a deliberate safeguard.
I may be missing important context about transaction isolation levels and race conditions that could make this pattern necessary. The EXISTS check might serve as an important guard rail.
While the comment raises an interesting point about query structure, we don't have enough context to confidently say this is incorrect or needs to be changed.
This comment is speculative and requires more context about the data model and requirements to validate. Following our rules, we should err on the side of removing speculative comments.
3. agents-api/agents_api/queries/docs/list_docs.py:135
- Draft comment:
Themetadata_filter
is directly appended to the query string, which could lead to SQL injection if not properly handled. Ensure that metadata keys and values are safely included in the query to prevent SQL injection vulnerabilities. - Reason this comment was not posted:
Comment did not seem useful.
4. agents-api/agents_api/queries/docs/search_docs_by_text.py:19
- Draft comment:
Theowner_types
andowner_ids
are passed as JSONB arrays, which might not be correctly handled by the SQL function. Ensure that these arrays are properly converted to UUID arrays to prevent unexpected behavior. - Reason this comment was not posted:
Comment did not seem useful.
5. agents-api/tests/test_docs_queries.py:11
- Draft comment:
Consider adding tests forsearch_docs_by_embedding
andsearch_docs_hybrid
to ensure comprehensive coverage of the search functionalities. - Reason this comment was not posted:
Confidence changes required:50%
Thetest_docs_queries.py
file has a test forsearch_docs_by_text
but it lacks tests forsearch_docs_by_embedding
andsearch_docs_hybrid
. Adding these tests would ensure comprehensive coverage of the search functionalities.
Workflow ID: wflow_qeLRxNLHNTGhklta
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
PR Type
Enhancement
Description
Added comprehensive document management system with the following features:
Core document operations:
Advanced search capabilities:
List and filtering:
Performance optimizations:
Changes walkthrough 📝
9 files
__init__.py
Initialize document management module with core operations
agents-api/agents_api/queries/docs/init.py
create_doc.py
Document creation with ownership management
agents-api/agents_api/queries/docs/create_doc.py
constraints
delete_doc.py
Document deletion with ownership validation
agents-api/agents_api/queries/docs/delete_doc.py
get_doc.py
Single document retrieval functionality
agents-api/agents_api/queries/docs/get_doc.py
list_docs.py
Paginated document listing with filters
agents-api/agents_api/queries/docs/list_docs.py
mmr.py
Maximal Marginal Relevance implementation for search
agents-api/agents_api/queries/docs/mmr.py
search_docs_by_embedding.py
Vector-based document search implementation
agents-api/agents_api/queries/docs/search_docs_by_embedding.py
search_docs_by_text.py
Full-text document search implementation
agents-api/agents_api/queries/docs/search_docs_by_text.py
search_docs_hybrid.py
Hybrid document search with score fusion
agents-api/agents_api/queries/docs/search_docs_hybrid.py
Important
Add comprehensive document management system with CRUD, advanced search, and performance optimizations.
create_doc.py
,delete_doc.py
,get_doc.py
, andlist_docs.py
.search_docs_by_text.py
.search_docs_by_embedding.py
.search_docs_hybrid.py
.mmr.py
.Doc
model inDocs.py
with new fields likemodality
,language
,index
,embedding_model
, andembedding_dimensions
.test_docs_queries.py
for CRUD operations and listing.This description was created by for 249513d. It will automatically update as commits are pushed.